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ABSTRACT. Zipf's law in its basic incarnation is an empirical probability dis- 
tribution governing the frequency of usage of words in a language. As Terence 
Tao recently remarked, it still lacks a convincing and satisfactory mathematical 
explanation. 

In this paper I suggest that at least in certain situations, Zipf's law can be 
explained as a special case of the a priori distribution introduced and studied by 
L. Levin. The Zipf ranking corresponding to diminishing probability appears then 
as the ordering determined by the growing Kolmogorov complexity. 

One argument justifying this assertion is the appeal to a recent interpretation by 
f**« ■ Yu. Manin and M. Marconi of asymptotic bounds for error-correcting codes in terms 

of phase transition. In the respective partition function, Kolmogorov complexity of 
O ! a code plays the role of its energy. 
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0. Introduction and summary 



0.1. Zipf's law. Zipf's law was discovered as an empirical observation ([Zil], 
[Zi2]): if all words Wk of a language are ranked according to decreasing frequency 
^ ■ of their appearance in a representative corpus of texts, then the frequency p k of 

Wk is (approximately) inversely proportional to its rank k: see e. g. Fig. 1 in [Mai] 
based upon a corpus containing 4 • 10 7 Russian words. 

For various other incarnations of this power exponent —1 law in many different 
statistical data, cf. [MurSo] and references therein. 

Theoretical models of Zipf's distribution also abound. In the founding texts of 
Zipf himself [Zi2], Zil], it was suggested that his distribution "minimizes effort". 
Mandelbrot in [Mand] described a concise mathematical framework for producing 
a model of Zipf's law. Namely, if we postulate and denote by Ck a certain "cost" 
(of producing, using etc.) of the word of rank k, then the frequency distribution 
Pk ~ 2~ h k minimizes the ratio h = C/H, where C := ^2 k P k ^ k * s the avera g e 
cost per word, and H := — ^ k Pk\og 2 Pk is the average entropy: see [Ma2]. 
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We get from this a power law, if Ck ~ log k. An additional problem, what is 
so special about power —1, must be addressed separately. For one possibility, see 
[MurSo] , sec. III. In this work, we suggest a different mathematical background (see 
the next subsection). 

In all such discussions, it is more or less implicitly assumed that empirically 
observed distributions concern fragments of a potential countable infinity of objects. 
I also postulate this, and work in such a "constructive world": see sec. 1.1 below 
for a precise definition. 

0.2. Zipf's law from complexity. In this note we suggest that (at least in 
some situations) Zipf's law emerges as the combined effect of two factors: 

(A) Rank ordering coincides with the ordering with respect to the growing (expo- 
nential) Kolmogorov complexity K(w) up to a factor exp (O(l)). 

More precisely, to define K(x) for a natural number x G Z + , we choose a Kol- 
mogorov optimal encoding, which is a partial recursive function u : Z + — > Z + , and 
put K(x) = K u (x) := min{y \ u(y) = x. Another choice of u changes K u (.) by a 
factor exp (O(l)). 

Furthermore, K(w) for elements w of a constructive world is defined as com- 
plexity of its number in a fixed structural numbering (cf. 1.1 below). Changing the 
numbering, we again change complexity by a exp (0(l))-f actor. 

(B) The probability distribution producing Zipf's law (with exponent — 1) is (an 
approximation to) the L. Levin maximal computable from below distribution: see 
[ZvLe], [Le] and [LiVi]. 

If we accept (A) and (£?), then Zipf's law follows from two basic properties of 
Kolmogorov complexity: 

(a) rank of w defined according to (A) is exp(0(l)) ■ K(w). 

(b) Levin's distribution assigns to an object w probability ~ KP(w)~ 1 where KP 
is the exponentiated prefix Kolmogorov complexity, and we have, up to exp (O(l))- 
factors, 

K{w) < KP{w) ■< K{w) ■ log 1+e K{w) 

with arbitrary e > 0. 

Slight discrepancy between the growth orders of K and KP is the reason why 
a probability distribution on infinity of objects cannot be constructed from K: the 
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series J2 m ^(m) -1 diverges. However, on finite sets of data this small discrepancy 
is additionally masked by the dependence of both K and KP on the choice of 
an optimal encoding. Therefore, when speaking about Zipf's Law, we will mostly 
disregard this difference. See also the discussion of the partition function for codes 
in 0.3 below. 

0.3. Complexity as effort. The picture described above agrees with Zipf's 
motto "minimization of effort" , but reinterprets the notion of effort: its role is now 
played by the logarithm of the Kolmogorov complexity that is by the length of the 
maximally compressed description of an object. This length is not computable, but 
it is the infimum of a sequence of computable functions. 

Such a picture makes sense especially if the objects satisfying Zipf's distribution, 
are generated rather than simply observed. 

Intuitively, whenever an individual mind, or a society, finds a compressed de- 
scription of something, this something becomes usable, and is used more often than 
other "something" whose description length is longer. For an expanded version of 
this metaphor applied to the history of science, see [Man3] . 

For words in the initial Zipf's observation, this principle refers to ways in which 
mind/brain generates and uses language. 

0.4. Relation to previous works. I am aware of two works where complexity 
is invoked in relation to Zipf's law: [Ve] and [MurSo]. 

(a) Briefly, viewpoint of [MurSo] is close to ours, but, roughly speaking, the 
authors focus on the majority of objects consisting of "almost random" ones: those 
whose size is comparable with Kolmogorov complexity, and which therefore cannot 
be compressed. This is justified by the authors assumption that the data corpus 
satisfying Zipf's Law comes from a sequence of successive observations over a certain 
system with stochastic behaviour. 

To the contrary, our ranking puts in the foreground those objects whose size 
might be very large in comparison with their complexity, because we imagine sys- 
tems that are generated rather than simply observed, in the same sense as texts 
written in various languages are generated by human brains. 

To see the crucial difference between the two approaches on a well understood 
mathematical example, one can compare them on the background of error-correcting 
codes, following [ManMar] . Each such code C (over a fixed alphabet) determines 
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a point in the unit square of the plane (transmission rate, minimal relative Ham- 
ming's distance). The closure of all limit code points is a domain lying below a 
certain continuous curve which is called asymptotic bound. 

If one produces codes in the order of growing size, most code points will form 
a cloud densely approximating the so called Varshamov-Gilbert bound that lies 
strictly below the asymptotic bound. 

To the contrary, if one produces codes in the order of their Kolmogorov complex- 
ity, their code points will well approximate the picture of the whole domain under 
the asymptotic bound: see details in [ManMar] . Moreover, Levin's distribution very 
naturally leads to a thermodynamic partition function on the set of codes, and to 
the interpretation of asymptotic bound as a phase transition curve: this partition 
function has the form Ylc K(C)~ S ^ where s(C) is a certain function defined on 
codes and including as parameters analogs of temperature and density. Here one 
may replace K with KP, and freely choose the optimal family defining complexity: 
this will have no influence at all on the form of the phase curve/asymptotic bound. 

Moreover, logK(w) that is the bit-size of a maximally compressed description 
of w, plays precisely the role of energy in this partition function, thus validating 
our suggestion to identify it with "effort" . 

It is interesting to observe that the mathematical problem of generating good 
error-correcting codes historically made a great progress in the 1980's with the 
discovery of algebraic geometric Goppa codes, that is precisely with the discovery 
of greatly compressed descriptions of large combinatorial objects. 

To summarize, the class of a priori probability distributions that we are con- 
sidering here is qualitatively distinct from those that form now a common stock of 
sociological and sometimes scientific analysis: cf. a beautiful synopsis of the latter 
by Terence Tao in [Ta] who also stresses that "mathematicians do not have a fully 
satisfactory and convincing explanation for how the [Zipf] law comes about and 
why it is universal". 

(b) We turn now to the paper [Ve], in which T. Veldhuizen considers Zipf's law 
in an unusual context that did not exist in the days when Kolmogorov, Solomonov 
and Chaitin made their discoveries, but which provides, in a sense, landscape for 
an industrial incarnation of complexity. Namely, he studies actual software and 
software libraries and analyzes possible profits from software reuse. Metaphorically, 
this is a picture of human culture whose everyday existence depends on a continuous 
reuse of treasures created by researchers, poets, philosophers. 
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Mathematically, reuse furnishes new tools of compression: roughly speaking, a 
function / may have a very large Kolmogorov complexity, but the length of the 
library address of its program may be short, and only the latter counts if one can 
simply copy the program from the library. 

In order to create a mathematical model of reuse and its Zipf 's landscape along 
the lines of this note, I need to define the mathematical notion of relative Kol- 
mogorov complexity K(f\F). This notion goes back to Kolmogorov himself and is 
well known in the case when /, F are finite combinatorial objects such as strings or 
integers (cf. [LiVi]). 

In the body of the paper, I generalize this definition to the case of a library F 
of programs. The library may even contain uncomputable oracular data, and thus 
we include into the complexity landscape oracle-assisted computations. 

0.5. Some justifications. Consider some experimental data demonstrating the 
dependance of Zipf's rank from complexity in the most natural environment: when 
we study the statistics not of all words, but only numerals, the names of numbers. 

Then in our model we expect that: 

(i) Most of the numbers n, those that are Kolmogorov "maximally complex" , will 
appear with probability comparable with n _1 (logn) _1_e , with a small e: "most 
large numbers appear with frequency inverse to their size" (in fact, somewhat 
smaller one). 

(ii) However, frequencies of those numbers that are Kolmogorov very simple, 
such as 10 3 (thousand), 10 6 (million), 10 9 (billion), must produce sharp local peaks 
in the graph of (p n ): cf. more detailed remarks in 0.5 below. 

The reader may compare these properties of the discussed class of Levin's dis- 
tributons, which can be called a priori distributions, with the observed frequencies 
of numerals in printed and oral texts in several languages, summarized in Dehaene, 
[De], p. Ill, Figure 4.4. (Those parts of the Dehaene and Mehler graphs in the 
book [De] that refer to large numbers, are somewhat misleading: they might create 
an impression that frequencies of the numerals, say, between 10 6 and 10 9 smoothly 
interpolate between those of 10 6 and 10 9 themselves, whereas in fact they abruptly 
drop down. See, however, a much more detailed discussion in [DeMe].) 

To me, their qualitative agreement looks convincing: brains and their societies 
do follow predictions of a priori probabilities. Of course, one has to remember that 
compression degrees that can be achieved by brains and civilisations might produce 
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quantitatively different distributions at the initial segments of a Kolmogorov Uni- 
verse, because here dependence of the complexity of objects on the complexity of 
generating them mechanisms becomes much more pronounced. In fact, this effect 
is also observed in the empirical data. 

There is no doubt that many instances of empiric Zipf 's laws will not be reducible 
to our complexity source. Such a reduction of the Zipf law for all words might 
require for its justification some neurobiological data: cf. [Mai], appendix A in the 
arXiv version. 

Another interesting possible source of Zipf 's law was considered in a recent paper 
[FrChSh]. The authors suggested that Zipf's rank of an object, member of a certain 
universe, might coincide with its PageRank with respect to an appropriate directed 
network connecting all objects. This mechanism generally produces a power law, 
but not necessarily exactly Zipf's one. 

In any case, the appeal to the uncomputable degree of maximal compression in 
our model of Zipf-Levin distribution is exactly what can make such a model an 
eye-opener. 

0.6. Fractal landscape of the Kolmogorov complexity and universality 
of Zipf's law. A graph of logarithmic Kolmogorov complexity of integers k (and 
its prefix versions) looks as follows: most of the time it follows closely the graph of 
log k, but infinitely often it drops down, lower than any given computable function: 
see [LiVi], pp. 103, 105, 178. The visible "continuity" of this graph reflects the 
fact that complexity of k + 1 in any reasonable encoding is almost the same as 
complexity of k. 

However, such a picture cannot convey extremely rich self-similarity properties 
of complexity. The basic fractal property is this: if one takes any infinite decidable 
subset of Z + in increasing order and restricts the complexity graph on this subset, 
one will get the same complexity relief as for the whole Z + : in fact, for any recursive 
bijection / of Z + with a subset of Z + we have K(f(x)) = exp(0(l)) ■ K(x). 

Seemingly, this source of "fractalization" might have a much wider influence: see 
[NaWe] and related works. 

If we pass from complexity to a Levin's distribution, that is, basically, invert the 
values of complexity, these fractal properties survive. 

This property might be accountable for "universality" of Zipf's law, because it 
can be read as its extreme stability with respect to the passage to various sub- 
universes of objects, computable renumbering of objects etc. In the same way, the 
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picture of random noise in a stable background is held responsible for universality 
of normal distribution. 

0.7. Plan of the paper. In the main body of the paper, I do not argue 
anymore that complexity may be a source of Zipf's distribution. Instead, I sketch 
mathematics of complexity in a wider context, in order to make it applicable in the 
situations described in [Ve]. 

In sec. 1, I define the notion of Kolmogorov complexity relative to an admis- 
sible family of (partial) functions. The postulated properties of such admissible 
families should reflect our intuitive idea about library reuse and/or oracle-assisted 
computations. 

In sec. 2, I suggest a formalization of the notion of computations that (po- 
tentially) produce admissible families. It turns out that categorical and operadic 
notions are useful here, as it was suggested in [Manl], Ch. IX. 

Acknowledgements. I first learned about Zipf's Law from (an early version of) 
the paper [Ma] by D. Yu. Manin, and understood better the scope of my construc- 
tions trying to answer his questions. The possibility that Zipf's law reflects Levin's 
distribution occurred to me after looking at the graphs in the book [De] by S. De- 
haene. Professor Dehaene also kindly sent me the original paper [DeMe]. C. Calude 
read several versions of this article and stimulated a better presentation of my argu- 
ments. An operadic description of a set of programs for computation of (primitive) 
recursive functions was discussed in [Ya], and my old e-mail correspondence with 
N. Yanofsky helped me to clarify my approach to the formalization of the notions 
of reuse and oracles. Finally, L. Levin suggested several useful revisions. I am very 
grateful to all of them. 

1. Admissible sets of partial functions and relative complexity 

1.1. Notations and conventions. We recall here some basic conventions of 
[Manl], Ch. V and IX. Let A, Y be two sets. A partial function from A to Y is 
a pair (D(f)J) where D(f) C A, / : D(f) ->■ Y. We call D(f) the domain of /, 
and often write simply / : A — > Y. If D(f) = 0, / is called an empty function. 
If D(f) = A, we sometimes call / a total function. If A is one-element set, then 
non-empty functions A — > Y are canonically identified with elements of Y. Partial 
functions can be composed in an obvious way: D(g o /) := f~ 1 {D{g) H Im(f)). 
Thus we may consider a category consisting of (some) sets and partial maps between 
them. 
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Put Z+ := the set of positive integers. Then (Z + ) m for m > 1 can be identified 
with the set of vectors (x±, . . . , x m ), Xi G Z + . By definition, (Z + )° is an one- 
element set, say, {*}. Any partial function / : (Z + ) m — > (Z + ) n will be called an 
(m, n)-function. For m = 0, such a non-empty function may and will be identified 
with a vector from (Z + ) n . 

Let X be an infinite set. A structure of constructive world on X is given by 
a set of bijections Mx called structure numberings, X — > Z + , such that any two 
bijections in it are related by a (total) recursive permutation of Z + , and conversely, 
any composition of a structural numbering with a recursive permutation is again a 
structure numbering. Explicitly given finite sets are also considered as constructive 
worlds. (A logically minded reader may imagine all our basic constructions taking 
place at the ground floor of the von Neumann Universe). 

Intuitively, X can be imagined as consisting of certain finite Bourbaki struc- 
tures that can be unambiguously described and encoded by finite strings in a finite 
alphabet that form a decidable set of strings, and therefore also admit a natural 
numbering. Any two such natural numberings, of course, must be connected by a 
computable bijection. 

Morphisms between two constructive worlds, by definition, consist of those set- 
theoretical maps which, after a choice of structural numberings, become partially 
recursive functions. Thus, constructive worlds are objects of a category, Construc- 
tive Universe. 

In order to introduce a formalization of oracle-assisted computations, we will 
have to extend the sets of morphisms allowing partial maps that might be non- 
computable. 

1.2. Admissible sets of functions. Consider a set $ of partial functions 
/ : (Z + ) m — > (Z + ) n , m, n > 0. We will call $ an admissible set, if it is countable 
and satisfies the following conditions. 

(i) $ is closed under composition and contains all projections (forget some co- 
ordinates), and embeddings (permute and/or add some constant coordinates) . 

Any (m + 1, n)-function can be considered as a family of (m, n)-functions {uk)'- 
Uf-(xi, . . . , x m ) := u(xi, . . . ,x m ,k). From (i) it follows that for any u G $ and 
k G Z + , also Uk G Similarly, if u(xi, . . . ,x m ) is in then 

UyX±j ••• 7 3^m? • • • ? 2-m+n) = tt(xi, . . . , X m ) 
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is in $. 

(ii) For any (m, n), there exists such an (m+1, n) -function «el that the family 
of functions Uk '■ (Z + ) m — > (Z + ) n , contains all (m,n) -functions belonging to <E>. 

We will say that such a function u (or family (uk)) is ample. 

(Hi) Let f be a total recursive function f whose image is decidable, and f defines 
a bisection between D(f) and image of f . Then $ contains both f and 

From now on, $ will always denote an admissible family. 

1.3. Complexity relative to a family. Choose an (m + 1, n)-function u G $ 
and consider it as a family of (m, n)-functions Uk as above. For any (m, n)-function 
/ G put K®(f) = min{/c | / = Uk}- The r.h.s. is interpreted as oo if there is 
no such k. We will call such a family u Kolmogorov optimal in $, if for any other 
(m + 1, n)-family v there is a constant c UjV such that for all (m, n)-functions / G $ 
we have < c u ,„K*(/). 

1.4. Theorem, a) If $ contains an ample family of (m + l,n) -functions, than 
it contains also a Kolmogorov optimal family of (m,n) -functions. 

b) If u and v are two Kolmogorov optimal families of '(m,n) -functions, then 

C^u< K u(f)/K(f)<Cu,v. 



Proof. Let 9 : Z + x Z + — > Z + be a total recursive bijection between Z + x Z + 
and a decidable subset of Z + . Assume moreover that 9(k,j) < k ■ (f)(j) for some 
(p : Z + — > Z + . Choose any ample family U G $ of (m + 1, n)-functions and put 

ii(xi,...,x m ,fe) := ^(a;!,...,^,^ -1 ^)). 

Then -u is ample and optimal, with the following bound for the constant c u y. 

c u , v < <KK$(y)). (1.1) 
In fact, it sufices to consider such v that / occurs in (vk). Then 

/(xi,...,x m ) = v(xi,...,a; m ,K*(/)) 
= t/(x!,...,x m ,<(/),^(t;)) 
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so that 

*?(/) < 0(K;(f),K*(v)) < K;(f)<KK*(v)). 

1.5. Constants related to Kolmogorov complexity estimates. In the 

inequality (1.1) estimating the dependance of Kolmogorov complexity on the choice 
of encoding, two factors play the central roles. 

One is Ky(v). Its effective calculation depends on the possibility of translating 
a program for v into a program given by U. In the situation where $ consists of 
all partial recursive functions, such a compilation can be performed if U satisfies 
a property that is stronger than ampleness: cf. [Ro] and [Sch] where such families 
are discussed and constructed. 

Another factor is the growth rate of 0. Below we will show how the task of 
optimization of 4> can be seen in the context of Levin's distributions, reproducing 
an argument from [Man2]. 

1.5.1. Slowly growing numberings of (Z + ) 2 . Let R = (Rk \ k E Z + ) be a 

sequence of positive numbers tending to infinity with k. For M e Z + , put 

V R {M) := {(k, I) e (Z+) 2 | kRi < M}. 

Clearly, 

oo 

cardVR(M) < 

i=i 

where [a] denotes the integral part of a. 
We have 

V R (M) C V R (M + 1), (Z+) 2 = U% =1 V R (M). 

Therefore we can define a bijection N R : (Z+) 2 — > Z + in the following way: 
N R {k,l) will be the rank of (k,l) in the total ordering < R of (Z) 2 determined 
inductively by the following rule: < R (k, I) iff one of the following alternatives 

holds: 

(a) iRj < kRi; 

(b) iRj = kRi and j < I; 

1.5.2. Proposition. The numbering N R is well defined and has the following 
property: all elements of V R (M + 1) \ V R (M) have strictly larger ranks than those 
of V R (M) . Moreover: 



M 

Ri 



< oo , 
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(i) If the set { (q,l) G QxZ + \q> Ri} is enumerable (image of a partial recursive 
function), then N R is computable (total recursive). 

(ii) If the series ^iZiR^ 1 converges and its sum is bounded by a constant c, 
then 

N R {k,l) <c(kRi + l). (1.2) 
(Hi) If the series YlkLi diverges, and 

M 

R i x ^ F ( M ) 

1=1 

for a certain increasing function F = F R , then 

N R (kJ) < (kRi + l)F(kRi + l). 



Proof. The first statements are an easy exercise. For the rest, notice that if M 
is the minimal value for which (k, 1) G V R (M), we have M — 1 < kR L < M and 

N R (k,l)<cavdV R (M), 

and in the case (ii) we have 

oo 

card V R (M) < MR ™ < c ( kR i + 

m=l 

Similarly, in the case (iii) we have 

M 

card V R (M) < M ^ R' 1 < (kR t + l)F(kR t + 1) 

m=l 

1.6. L. Levin's probability distributions. From (1.2) one sees that any 
sequence {Ri} with converging R^ 1 can be used in order to construct the bijec- 
tion Z + x Z + — > Z + , (k,l) h- > N R (k,l) linearly growing wrt k. Assume that it is 
computable and therefore can play the role of 9 in (b). 
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In this case, for any /, the set of rational numbers k/M < r\ := R r must be 
decidable. 

Even if we weaken the last condition, requiring only enumerability of the set 
k/M < r\ (i. e. asking each r\ to be computable from below), the convergence of 
r\ implies that there is a universal upper bound (up to a constant) for such r\. 
Namely, let KP be the exponentiated prefix Kolmogorov complexity on Z defined 
with the help of a certain optimal prefix enumeration (see [LiVi], [CaSt] for details). 

1.6.1. Proposition. ([Le]). For any sequence of computable from below 
numbers r\ with convergent J2i r i> there exists a constant c such that for all I, 
r;<c-KP(/)- 1 

More generally, L. Levin constructs in this way a hierarchy of complexity mea- 
sures associated with a class of abstract norms, functionals on sequences computable 
from below. 

As I explained in the Introduction, this paper suggests that these mathematical 
distribution laws might lead to a new explanation of statistic properties of some 
observable data. 

2. The computability (pro)perads and admissible families 

2.1. Libraries, oracles, and operators. In this section, we will define ad- 
missible sets of partial (m, n) functions $ formalizing intuitive notions of "software 
libraries and their reuse" and "oracle-assisted computation" . A Kolmogorov com- 
plexity relative to such a set will include a formalization of the intuitive notion of 
relative complexity K(f\g) in the cases when /, g are recursive ("reuse of g") and 
when g might be uncomputable ("oracle-assisted computation"). 

In this section our main objects are not functions but objects of higher types: 

(i) Programs for computing functions, eventually even names of oracles telling 
us values of uncomputable functions, and programs including these names. 

(ii) Operators, that is programs computing certain functions whose arguments 
and values are themselves programs. 

The main reason for such shift of attention is this. Already the set of partial 
recursive (m, n)-functions for m > 1 is not a constructive world, as well as its 
extensions with which we deal here. To the contrary, the set of programs calculating 
recursive functions in a chosen programming language, such as Turing machines, 
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or texts in a lambda-calculus, is constructive, but endowed with uncomputable 
equivalence relation: "two programs compute one and the same function". We will 
call such worlds "programming methods" . 

More precisely, let X, Y be two constructive worlds. A programming method is a 
constructive world P(X, Y) given together with a map P(X, Y) — > ParSet(X, Y), 
p i->- p, where ParSet is the category of sets with partial maps as morphisms. We 
will systematically put bar over the name of a program to denote the function 
p : X — > Y which p computes. 

A good programming method must have additional coherence properties. For 
brevity, consider only infinite constructive worlds, and assume that P computes 
all recursive isomorphisms. We can then extend P to any two infinite construc- 
tive worlds, and it is natural to require the existence of at least two additional 
programs / operators 

Ev e P{X x P(X,Y),Y), E~v(x,p) :=p(x) for xGl, pG P(X,Y), (2.1) 

Comp: P(P(X,Y) x P(Y, Z)) ^ P(X, Z), C^p~(f,g)=goJ. (2.2) 

We will call such objects as Ev and Comp operators and say that they lift the 
respective operations on functions: evaluation at a point and composition. 

For more detailed mathematical background, cf. [Manl], Ch. IX, sec. 3-5. 

In the following we start with a certain programming method P computing 
at least all partial recursive (m, n)-functions, and describe ways of extending it 
necessary to formalize the notions of library reuse and oracles. 

2.2. Constructing admissible sets. Each such set $ will be defined in the 
following way. 

(i) Choose a constructive world S consisting of programs for computing (m, n)- 
functions. It will be a union of two parts: (programs of) elementary functions and 
library functions. All elementary functions will be (Turing) computable, i. e. (par- 
tial) recursive. Library functions must form a constructive world (possibly, finite), 
with some fixed numbering. The number of a library program is called its address. 
Both library functiona and elementary functions will be decidable subsets of S. 

(ii) Define a set of operators that can be performed on finite strings of partial 
(mi, rii) functions. They will be obtained by iterating several basic operators, such 
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as Comp and Ev above. The world OP of programs/names of such operators will 
be easy to encode by a certain constructive world of labelled graphs. 

(iii) Take an element P G OP and specify a finite string s := (fi,---,f r ) of 
(addresses) of functions from S that can serve as input to P. The pair (P, s) is 
then a program for calculation of a concrete string of (m, n) -functions. 

(iv) Finally, $ will be defined as the minimal set of functions computable by the 
programs in S and closed with respect to the applications of all operators in OP. 

We will now give our main examples of objects informally described in (i)-(iv). 

2.3. The language of directed graphs. In our world OP, each operator 
p can take as input a finite sequence of partial functions fa : (Z+) mi — > (Z + ) ni , 
Trii,rii > 0, % = l,...,k) and produce from it another finite sequence of partial 
functions gi : (Z + ) Pi — > (Z + ) 9i , m t , rii > 0, i = 1, . . . , I). We will call the signature 
of p the family 

sign (p) := [(mi,m), . . . , (m k , n k ); (pi, Qi), . . . , (pj, gj)]. (2.3) 

Two operations p, a can be composed if their signatures match: poo takes as input 
the output of a. 

Below we will describe explicitly a set of basic operations OPq. After that the 
whole set OP will be defined as the minimal set of operations containing OPq and 
closed under composition. 

A visually convenient representation of elements of OP is given by (isomorphism 
classes of) directed labeled graphs: see formal definitions in [BoMan], Section 1. 

More precisely, each basic operator p of signature (2.3) is represented by a corolla: 
graph with one vertex labelled by p; k flags oriented towards the vertex (inputs) 
and I flags oriented from the vertex (outputs) . Moreover, inputs and outputs must 
be totally ordered, and labelled by respective pairs (mi,ni). 

Labelled directed graphs with more vertices are obtained from a disjoint finite set 
of corollas by grafting some outputs to some inputs. Grafted pair (output, input) 
must have equal labels (m, n). Flags that remain ungrafted form the inputs/outputs 
of the total graph. 

Notice that directed graphs ([BoMan], 1.3.2) do not admit oriented loops. 

2.4. Basic operators. In this subsection, we describe operations on strings 
of functions that must be represented by the respective operators. For brevity, we 
will denote by single Greek letters these operators. 
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(a) Composition of functions. This basic operator, say 7, has signature of the 
form [(m, n), (n,q); (m, q)]. It produces from two partial functions / : (Z + ) m — > 
(Z+) n and g : (Z+) n (Z+)« their composition o / : (Z+) m (Z+)«. Recall 
that _D((7 o /) := f~ 1 {D{g)). It is a special case of operator Comp. 

(b) Juxtaposition. This basic operator, say ex, has signature of the form 

[(m,ni), . . . , (m,n k ); (m,ni H h n fc )]. 

It produces from /c partial functions : (Z + ) m — > (Z + ) ni the function (/1, . . . , 

D((f 1 ,...J k )) = D(f 1 )n---DD(f k ), 

(fl, ■ ■ ■ , fk)(xi, • • • , £ m ) = • • • , £m), • • • , /fc(^l, • • • ,X m )). 

(c) Recursion. This basic operator, say p, has signature of the form 

[(m,l),(m + 2,l);(m + l,l)] 

It produces from partial functions / : (Z + ) m — > Z + and g : (Z + ) m+2 — > Z + the 
function h : (Z+) m+1 -> Z+such that 

h(x±, . . . , X m , 1) = /(^l) • • • ? 

h(xi, . . . , x m , + = . . . , x m , k,h(xi, . . . , x m , /c))) 

for fc > 1. 

The definition domain -D(/i) is also defined by recursion: 

(xi, . . . , x m , 1) G £>(/*) (xi, . . . , x m ) G D(f), 

(x 1 , . . .,x m , k + 1) G £>(/i) 
(xi, . . . , x m , k) G D(h) and (xi, . . . , x m , k, h(x 1 , ...,x m , k)) G D(g) 

for k > 1. 

(d) Operator \i. Its signature is [(m+ 1, 1); (m, 1)]. Given an (m+ 1, l)-function 
/, it produces the (n, l)-function h with the definition domain 

D(h) = {(x\, . . . , x n ) I 3x n +i > 1 such that 
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f(xi, ...,x n , x n+ i) = 1 and (xi, . . 



, k) E D(f) for all k < x n+1 }. 



At the definition domain 



h(x 1 ,...,x n ) 



min{a; n+ i 



/(xi,..., 



x n+1 ) = 1 }. 



(e) Identity operation t. 

2.5. The constructive world of operations OP. Returning now to sec. 
2.2, we define OP as the set of finite directed labelled graphs with totally ordered 
inputs/outputs at each vertex satisfying the following condition: each vertex is 
labelled by one of the letters 7, a, p, /i, 1 and its inputs/outputs are labelled by the 
respective components of the relevant signature.. 

We can choose any one of the standard ways to encode such graph by a string over 
a fixed finite alphabet, and then define a structure numbering of OP by ranking 
these words in alphabetic order. The subset of well-formed strings that encode 
graphs ought to form a decidable subset of all strings, and all natural functions 
such as 

graph >->■ sequence of all inputs of the graph with their (m, n) — labellings 

ought to be total recursive. 

In the final count, each element of OP determines an operation on finite sets of 
partial functions, producing another finite set of partial functions. Moreover, OP 
can be enriched to a free (pro)perad acting on finite strings of partial functions. 
If the signature of this string does not match the signature of the inputs of the 
operation, we may and will agree that the operation produces an empty function. 
The enrichment however requires some care and higher categorical constructions. 

2.6. Basic partial functions. Let S be a constructive world of programs/oracles 
calculating partial (m, n)-functions. Denote by OP(S) the minimal set of such pro- 
grams containing S and closed with respect to application to them of operators from 
OP. In the remainder of this paper, we will always include in S a set of basic recur- 
sive functions S rec , such that OP(S rec ) consists of all (partial) recursive function. 
In [Manl], Ch. V, Sec. 2, the following set is chosen: 



sue : Z + ->■ Z + , + 
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l< n ) : (Z+) m ->■ Z+, (xi,...x m ) ^ 1, n > 0. 
pr™ : (Z+) m ^Z+, (xi,...x m )^xi, n > 1. 

2.7. Admissible sets and library reuse. The standard Kolmogorov complex- 
ity of partial recursive functions is defined relative to the admissible set of functions 
$ computable by programs from OP(S rec ). If we include into the set S only some 
programs of recursive functions, then the total set of computable functions $ will 
not grow, but some functions will be computable by much shorter programs because 
the price of writing a program for a library program can be disregarded. 

2.8. Admissible sets including oracles for uncomputable functions. 

Here one more complication arises: the requirement (ii) in our definition of the 
admissible sets of functions is not satisfied automatically in the world OP(S) as 
it was in the case when the respective set of functions consisted only on recursive 
functions. 

In order to remedy this, we have to add to the list of basic operators the operator 
Ev from (2.1). Its participation in the iteration of our former basic operations 
cannot be in an obvious way described by labelled graphs, so more systematic 
treatment is required: seemingly, we are in the realm of "expanding constructive 
universe", some propaganda for which was made in [Manl], Ch. IX, sec. 3. 
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